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Abstract 

We develop an information-theoretic view of the stochastic block model, a popular statistical 
model for the large-scale structure of complex networks. A graph G from such a model is 
generated by first assigning vertex labels at random from a finite alphabet, and then connecting 
vertices with edge probabilities depending on the labels of the endpoints. In the case of the 
symmetric two-group model, we establish an explicit ‘single-letter’ characterization of the per- 
vertex mutual information between the vertex labels and the graph. 

The explicit expression of the mutual information is intimately related to estimation-theoretic 
quantities, and -in particular- reveals a phase transition at the critical point for community 
detection. Below the critical point the per-vertex mutual information is asymptotically the same 
as if edges were independent. Correspondingly, no algorithm can estimate the partition better 
than random guessing. Conversely, above the threshold, the per-vertex mutual information is 
strictly smaller than the independent-edges upper bound. In this regime there exists a procedure 
that estimates the vertex labels better than random guessing. 


1 Introduction and main results 

The stochastic block model is the simplest statistical model for networks with a community (or 
cluster) structure. As such, it has attracted considerable amount of work across statistics, machine 
learning, and theoretical computer science |HLL83[ IDF891 [SN971ICK991 lABFXOSj . A random graph 
G = {V, E) from this model has its vertex set V partitioned into r groups, which are assigned r 
distinct labels. The probability of edge (i,j) being present depends on the group labels of vertices 
i and j. 

In the context of social network analysis, groups correspond to social communities |HLL83] . 
For other data-mining applications, they represent latent attributes of the nodes [McSOlj . In all of 
these cases, we are interested in inferring the vertex labels from a single realization of the graph. 

In this paper we develop an information-theoretic viewpoint on the stochastic block model. 
Namely, we develop an explicit (‘single-letter’) expression for the per-vertex conditional entropy 
of the vertex labels given the graph. Equivalently, we compute the asymptotic per-vertex mutual 
information between the graph and the vertex labels. Our results hold asymptotically for large 
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networks under suitable conditions on the model parameters. The asymptotic mutual information 
is of independent interest, but is also intimately related to estimation-theoretic quantities. 

For the sake of simplicity, we will focus on the symmetric two group model. Namely, we 
assume the vertex set V = [n] = {1,2,, n} to be partitioned into two sets V = V+UV-, with 
P(i G k+) = P(i G V-) = 1/2 independently across vertices i. In particular, the size of each group 
Binom(n, 1/2) concentrates tightly around its expectation n/2. Conditional on the 
edge labels, edges are independent with 


F{{i,j)GE\V+,V.) 


Pn if {i,j} <^V+, OT {i,j} CV-, 
Qn otherwise. 


( 1 ) 


Throughout we will denote hy X = {Xi)i^v the set of vertex labels Xi G {+1, —1}, and we will be 
interested in the conditional entropy H(X\G) or -equivalently- the mutual information I(X;G) 
in the limit n —>• oo. We will write G ~ SBM(n;p, q) (or (X, G) ~ SBM(n;p, q)) to imply that the 
graph G is distributed according to the stochastic block model with n vertices and parameters p, q. 

Since we are interested in the large n behavior, two preliminary remarks are in order: 

1. Normalization. We obviously hav^O < H{X\G) < n log2. It is therefore natural to study 
the per-vertex entropy H{X\G)/n. 

As we will see, depending on the model parameters, this will take any value between 0 and 
log 2. 


2. Scaling. The reconstruction problem becomes easier when pn and qn are well separated, and 
more difficult when they are closer to each other. For instance, in an early contribution. Dyer 
and Frieze |DF89j proved that the labels can be reconstructed exactly -modulo an overall 
flip- if Pn = p > qn = Q are distinct and independent of n. This -in particular- implies 
H{X\G)/n —)• 0 in this limit (in fact, it implies H{X\G) —)• log2). In this regime, the ‘signal’ 
is so strong that the conditional entropy is trivial. Indeed, recent work [ABHI41IMNSI4^ 
show that this can also happen with pn and qn vanishing, and characterizes the sequences 
{Pn,qn) for which this happens. (See Section]^ for an account of related work.) 

Let Pn = {pn + Qn)/2 be the average edge probability. It turns out that the relevant ‘signal- 
to-noise ratio’ (SNR) is given by the following parameter: 


n {pn - Qn) 

4Pn(l - Pn) ' 


Indeed, we will see that H{X\G)/n of order I, and has a strictly positive limit when Xn is of 
order one. This is also the regime in which the fraction of incorrectly labeled vertices has a 
limit that is strictly between 0 and I. 


1.1 Main result: Asymptotic per-vertex mutual information 

As mentioned above, our main result provides a single-letter characterization for the per-vertex 
mutual information. This is given in terms of an effective Gaussian scalar channel. Namely, define 
the Gaussian channel 


Yo = Yo{y) = ^Xo + Zo, 

^Unless explicitly stated otherwise, logarithms will be in base e, and entropies will be measured in nats. 


(3) 
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where Xq ~ Uniform({+l, —1}) independenlj^ of Zq ~ N(0,1). We denote by mmse( 7 ) and 1 ( 7 ) the 
corresponding minimum mean square error and mutual information: 


1(7) 


mmse( 7 ) 


E 


(^Py\x{Yo{i)\Xo) 'I 

dPY {Yoil)) J ’ 
E{(Xo-E{Xo|yo(7)})'} • 


log I 


(4) 

(5) 


In the present case, these quantities can be written explicitly as Gaussian integrals of elementary 
functions: 


1 ( 7 ) = 7 - Elogcosh (7 + y/^Zo) , 
mmse( 7 ) = 1 — E| tanh (7 + y^Zo)^j ■ 


( 6 ) 

(7) 


We are now in position to state our main result. 

Theorem 1.1. For any A > 0, let 7 * = 7 * (A) be the largest non-negative solution of the equation: 

7 = A(1 — mmse( 7 )) . ( 8 ) 

We refer to 7 * (A) as to the effective signal-to-noise ratio. Further, define '£'( 7 , A) by: 

't(7.A) = j + ^-| + l(7). (9) 

Let the graph G and vertex labels X be distributed according to the stochastic block model with n 
vertices and parameters pn, Qn (i-G- (G,X) SBM(n;pn,gn)J and define Xn = n (p„-gf„)2/(4p„(l- 

Pn))- 

Assume that, as n ^ 00 , (i) An X and {ii) np„(l — p^) —)• 00 . Then, 

hm -I(X;G) = ^( 7 *(A),A). (10) 

n^oo Tl 

A few remarks are in order. 

Remark 1.2. Of course, we could have stated our result in terms of conditional entropy. Namely 

lim -R(X|G) =log 2 - 4 '( 7 ,(A),A). (11) 

n—^oo Tl 

Remark 1.3. Notice that our assumptions require np„(l —Pn) 00 at any, arbitrarily slow, rate. 
In words, this corresponds to the graph average degree diverging at any, arbitrarily slow, rate. 

Recently (see Section for a discussion of this literature), there has been considerable interest 
in the case of bounded average degree, namely 

a b 

Pn = - , qn = -, (12) 

n n 

with a, b bounded. Our proof gives an explicit error bound in terms of problem parameters even 
when np„(l — p^) is of order one. Hence we are able to characterize the asymptotic mutual in¬ 
formation for large-but-bounded average degree up to an offset that vanishes with the average 
degree. 

^ Throughout the paper, we will generally denote scalar equivalents of vector/matrix quantities with the 0 subscript 
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Explicitly, we prove that: 


lim sup 

n^oo 


-/(X;G)-^^(7*(A),A) 

n 


< 


C\^ 

\/a + h ’ 


for some absolute constant C. 


(13) 


Our main result and its proof has implications on the minimum error that can be achieved in 
estimating the labels X from the graph G. For reasons that will become clear below, a natural 
metric is given by the matrix minimum mean square error 

MMSE„(A) = - E{XX'^|G}||5,} . (14) 

(Occasionally, we will also use the notation MMSE(A; n) for MMSE„(A).) Using the exchangeability 
of the indices {1,..., n}, this can also be rewritten as 

MMSE„(A) = —^ E{[XiXj-E{X^Xj\G}Y} (15) 

' l<i<j<n 

= E{[X,X2-E{X^X2\G}]‘^} (16) 

= ^min E{[XiX 2 -xi 2 {G)f]. (17) 

^12 

(Here Qn denotes the set of graphs with vertex set [n].) In words, MMSE„(A) is the minimum error 
incurred in estimating the relative sign of the labels of two given (distinct) vertices. Equivalently, 
we can assume that vertex 1 has label Xi = +1. Then MMSEn,(A) is the minimum mean square 
error incurred in estimating the label of any other vertex, say vertex 2. Namely, by symmetry, we 
have (see Section]^ 

MMSE„(A) =E{[X2 -E{X2|Xi = +l,G}]^|Xi = +1} (18) 

= ^ min E{[X 2 -X 2 |i(G)]Vi = +!}. (19) 

In particular MI\/ISE„(A) G [0,1], with MMSE„(A) —)• 1 corresponding to random guessing. 

Theorem 1.4. Under the assumptions of Theorem \l.l\ (in particular assuming Xn ^ X as n ^ oo), 
the following limit holds for the matrix minimum mean square error 

hm MMSE„(A„) = 1 - . (20) 

n^oo 

Further, this implies lim^^oo MMSE„(A,i) = I for A < 1 and lim„_,.oo MMSE„(A,i) < 1 for A > 1. 

For further discussion of this result and its generalizations, we refer to Section]^ In particular. 
Corollary |3.7| establishes that A = 1 is a phase transition for other estimation metrics as well, in 
particular for overlap and vector mean square error. 

Remark 1.5. As Theorem [13 also the last theorem holds under the mild condition that the 
average degree np^ diverges at any, arbitrarily slow rate. This should be contrasted with the phase 
transition of naive spectral methods. 

It is well understood that the community structure can be estimated by the principal eigenvector 
of the centered adjacency matrix G—E{G} = (G—p^ll"''). (We denote by G the graph as well as its 
adjacency matrix.) This approach is successful fro A > I but requires average degree > (logn)'^ 
for c a constant |CDMF09l IBGN iT] . 
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Remark 1.6. Our proof of Theorem 1.1 and Theorem 1.4 involves the analysis of a Gaussian 
observation model, whereby the rank one matrix is corrupted by additive Gaussian noise, 

according to = ^^/XJnX + Z. In particular, we prove a single letter characterization of 
the asymptotic mutual information per dimension in this model limn-^-oo ^), cf. Theorem 


4.3 below. The resulting asymptotic value is proved to coincide with the asymptotic value in the 


stochastic block model, as established in Theorem [Tt} In other words, the per-dimension mutual 
information turns out to be universal across multiple noise models. 


1.2 Outline of the paper 

In Section we review the literature on this problem. We then discuss the connection with es¬ 
timation in Section]^ This section also demonstrates how to evaluate the asymptotic formula in 
Theorem [m 

Section describes the proof strategy. As an intermediate step, we introduce a Gaussian 
observation model which is of independent interest. The proof of Theorem o is reduced to two 
main propositions: 

• Proposition |4.1| establishes that -within the regime defined in Theorem o- the stochastic 
block model is asymptotically equivalent to the Gaussian observation model (see Section H 
for a formal definition). This statement (with explicit error bounds) is proved in Section ^ 
through a careful application of the Lindeberg method. 


Proposition 4.2 develops a single-letter characterization of the asymptotic per-vertex mutual 
information of the Gaussian observation model. The proof of this fact is presented in Section 
l^and builds on two steps. We first prove an asymptotic upper bound on the matrix minimum 
mean square error MMSE„(A) using an approximate message passing (AMP) algorithm. We 
then use an area theorem to prove that this upper bound is tight. 


Finally, Section contains the proof of Theorem 1.4 
the appendices. 


Several technical details are deferred to 


1.3 Notations 


The set of first n integers is denoted by [n] = {1, 2,..., n}. 

When possible, we will follow the convention of denoting random variables by upper-case letters 
(e.g. X,Y, Z,...), and their values by lower case letters (e.g. x,y, z,...). We use boldface for 
vectors and matrices, e.g. X for a random vector and x for a deterministic vector. The graph G 
will be identified with its adjacency matrix. Namely, with a slight abuse of notation, we will use G 
both to denote a graph G = (V = [n],E) (with V = [n] the vertex set, and E the edge set, i.e. a 
set of unordered pairs of vertices), and its adjacency matrix. This is a symmetric zero-one matrix 
G = with entries 






1 ii{i,j)£E, 
0 otherwise. 


( 21 ) 


Throughout we assume Gu = 0 by convention. 
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We write /i(n) = / 2 (ra) + 0(/3(n)) to mean that \fi{n) — f 2 {n)\ < Cf 3 {n) for a universal 
constant C. We denote by C a generic (large) constant that is independent of problem parameters, 
whose value can change from line to line. 

We say that an event holds with high probability if it holds with probability converging to one 
as n —>■ oo. 

We denote the £2 norm of a vector x by ||a ;||2 and the Frobenius norm of a matrix Y by 
The ordinary scalar product of vectors a,b £ M™' is denoted as (a, b) = YllLi o-ih- 

Unless stated otherwise, logarithms will be taken in the natural basis, and entropies measured 
in nats. 


2 Related work 


The stochastic block model was first introduced within the social science literature in |HLL83j . 
Around the same time, it was studied within theoretical computer science |BCLS87l IDF89| . under 
the name of ‘planted partition model. ’ 

A large part of the literature has focused on the problem of exact recovery of the community 
(cluster) structure. A long series of papers [BCLS871 IDF891 |Bop87 ISN971 IJS981 ICK991 ICIOIL 
IMcSOH IBCn9( IRC Yin ICWA12( ICSX121 IVul4( IYC14] . establishes sufficient conditions on the gap 
between pn and qn that guarantee exact recovery of the vertex labels with high probability. A sharp 
threshold for exact recovery was obtained in |ABH14( IMNS14a| . showing that for pn = alog(n)/n, 
qn = /31og(n)/n, a,/? > 0, exact recovery is solvable (and efficiently so) if and only if y/a — > 2. 

Efficient algorithms for this problem were also developed in |YP141 IBH14( IBanl5j . For the SBM 
with arbitrarily many communities, necessary and sufficient conditions for exact recovery were 
recently obtained in |AS15j . The resulting sharp threshold is efficiently achievable and is stated in 
terms of a CH-divergence. 

A parallel line of work studied the detection problem. In this case, the estimated community 
structure is only required to be asymptotically positively correlated with the ground truth. For 
this requirement, two independent groups |Masl4[ lMNS14b) proved that detection is solvable (and 
so efficiently) if and only if (a — 6)^ > 2(a + b), when pn = a/n, qn = b/n. This settles a conjecture 
made in [DKMZllj and improves on earlier work jCo M- Results for detection with more than two 
communities were recently obtained in |GV141 ICBV15( IAS 151 IBLM15j . A variant of community 
detection with a single hidden community in a sparse graph was studied in |Monl5j . 

In a sense, the present paper bridges detection and exact recovery, by characterizing the mini¬ 
mum estimation error when this is non-zero, but -for A > I- smaller than for random guessing. 

An information-theoretic view of the SBM was first introduced in |AMI3l [AMISj . There it was 
shown that in the regime of pn = a/n, qn = b/n, and a < b (i.e., disassortative communities), the 
normalized mutual information I{X-,G)/n admits a limit as n —?• 00 . This result is obtained by 
showing that the condition entropy H[X\G) is sub-additive in n, using an interpolation method 
for planted models. While the result of |AMI3l lAMISj holds for arbitrary a < b (possibly small) 
and extend to a broad family of planted models, the existence of the limit in the assortative case 
a > 6 is left open. Further, sub-additivity methods do not provide any insight as to the limit value. 

For the partial recovery of the communities, it was shown in [MNSI4^ that the communities 
can be recovered up to a vanishing fraction of the nodes if and only if n{p — q)^/{p + q) diverges. 
This is generalized in |ASI5] to the case of more than two communities. In these regimes, the 
normalized mutual information I[X-,G)/n (as studied in this paper) tends to log2 nats. For the 
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constant degree regime, it was shown in |MNS13j that when (a — b)^/(a + b) is sufficiently large, 
the fraction of nodes that can be recovered is determined by the broadcasting problem on tree 
[EKPSOO] . Namely, consider the reconstruction problem whereby a bit is broadcast on a Galton- 
Watson tree with Poisson((a + 6)/2) offspring and with binary symmetric channels of bias b/{a + b) 
on each branch. Then the probability of recovering the bit correctly from the leaves at large depth 
gives the fraction of nodes that can be correctly labeled in the SBM. 

In terms of proof techniques, our arguments are closest to [KMlll IDM14| . We use the well- 
known Lindeberg strategy to reduce computation of mutual information in the SBM to mutual 
information of the Gaussian observation model. We then compute the latter mutual information 
by developing sharp algorithmic upper bounds, which are then shown to be asymptotically tight 
via an area theorem. The Lindeberg strategy builds from [KMIH IGhaOGI while the area theorem 
argument also appeared in |MT06j . We expect these techniques to be more broadly applicable 
to compute quantities like normalized mutual information or conditional entropy in a variety of 
models. 

Let us finally mentioned that the result obtained in this paper are likely to extend to more gen¬ 
eral SBMs, with multiple communities, to the Censored Block Model studied in |AM15l lABBSldal 
IHG131 ICHG141 IGG141 IABBS14bl I GPS Y 141 [BHlil IGBV151 ISKLZlSj . the Labeled Block Model 
|HLM12i IXLM14) . and other variants of block models. In particular, it would be interesting to 
understand which estimation-theoretic quantities appear for these models, and whether a general 
result stands behind the case of this paper. 

While this paper was in preparation, Lesieur, Krzakala and Zdborova |LKZ15j studied esti¬ 
mation of low-rank matrices observed through noisy memoryless channels. They conjectured that 
the resulting minimal estimation error is universal across a variety of channel models. Our proof 
(see Section 1^ below) establishes universality across two such models: the Gaussian and the binary 
output channels. We expect that similar techniques can be useful to prove universality for other 
models as well. 


3 Estimation phase transition 


In this section we discuss how to evaluate the asymptotic formulae in Theorem o and Theop 


em 


1.4 We then discuss the consequences of our results for various estimation metrics. 

Before passing to these topics, we will derive a simple upper bound on the per-vertex mutual 
information, which will be a useful comparison for our results. 


3.1 An elementary upper bound 

It is instructive to start with an elementary upper bound on /(A; G). 

Lemma 3.1. Assume pn, Qn satisfy the assumptions of Theorem \l.l\ (in particular (i) Xn ^ X and 
(ii) np^{l —Pn) —)• ooj. Then 


limsup —/(A; G) < — . 

n—^OQ 4 


( 22 ) 
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Figure 1: Illustration of the fixed point equation Eq. The ‘effective signal-to-noise ratio’ 7 * (A) 
is given by the intersection of the curve 7 1 —>■ G{'y) = 1 — mmse( 7 ), and the line 7 /A. 


Proof. We have 

-I{X-,G) = -H{G)--H{G\X) (23) 

n n n 

if(0«|V) (24) 

l< 2 <ji<n 

^:^^H(G)-- V ff(GijlXi-X,) (25) 

n n 

l< 2 <j'<n 

Yj I{X,-Xj-Gij) = '^I{Xi-X2-,Gi2), (26) 

l< 2 <j'<n 

where (a) follows since {Gijji^j are conditionally independent given X and ( 6 ) because Gij only 
depends on X through the product Xi-Xj (notice that there is no comma bnt product in H{Gij\Xi- 
X,). 

From our model, it is easy to check that 

I{Xi ■ X2 ;Gi2) = ^Pnlog^ + ^ gnlog^ + ^ (1 - Pn) log ^ ^ (1 “ Qn) log (27) 

The claim follows by substituting Pn = Pn ~ Pn)^n/n, Qn = Pn ~ \/Pn(l ~Pn)^nln and 

by Taylor expansiorj^ □ 


3.2 Evaluation of the asymptotic formula 

Our asymptotic expression for the mutnal information, cf. Theorem [13 and for the estimation 
error, cf. Theorem 1.4, depends on the solution of Eq. (Isl) which we copy here for the reader’s 


^Indeed Taylor expansion yields the stronger result n ^I{X\ G) < (A„/4) + n ^ for all n large enough. 












Figure 2 : Left frame: Asymptotic mutual information per vertex of the two-groups stochastic 
block model, as a function of the signal-to-noise ratio A. The dashed lines are simple upper bounds: 
lim„_,.oo-f(AC; G)/n < A/4 (cf. Lemma 3.1) and I{X;G)/n < log 2. Right frame: Asymptotic 
estimation error under different metrics (see Section [3.3[ ). Note the phase transition at A = 1 in 
both frames. 


convenience: 


Here we defined 


7 = A(l — mmse( 7 )) = XG{'j). 


(28) 


G( 7 ) = 1 — mmse( 7 ) = E{ tanh (7 -|- 1/7 . 


(29) 


The effective signal-to-noise ratio 7 * (A) that enters Theorem o and Theorem |1.4| is the largest 
non-negative solution of Eq. ([^ . This equation is illustrated in Figure 

It is immediate to show from the definition (29) that G( •) is continuous on [0, 00 ) with G(0) = 0, 
and lim.y^oo Gil) = 1- This in particular implies that 7 = 0 is always a solution of Eq. Q. Further, 
since nnnnse( 7 ) is monotone decreasing in the signal-to-noise ratio 7 , is monotone increasing. 
As shown in the proof of Remark 6.1 (see Appendix B.2| ), G( •) is also strictly concave on [0, 00 ). 
This implies that Eq. ([^ as at most one solution in (0, 00 ), and a strictly positive solution only 
exists if AG'(O) = A > 1. 

We summarize these remarks below, and refer to Figure for an illustration. 


Lemma 3.2. The effective SNR, and the asymptotic expression for the per-vertex mutual informa¬ 
tion in Theorem have the following properties: 

• For X <1, we have 7 *(A) = 0 and 'I'( 7 *(A), A) = A/4. 


• For X < 1, we have 7 * (A) G (0, A) strictly with 7 *(A)/A —)• 1 as A —)• 00 . 

Further, 4 '( 7 *(A), A) < A/4 strictly with 'I'( 7 *(A), A) —)• log 2 os 7 —)> 00 . 

Proof. All of the claims follow immediately form the previous remarks, and simple calculus, except 
the claim T( 7 *(A), A) < A/4 for A > 1. This is direct consequence of the variational characterization 
established below. □ 
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We next give an alternative (variational) characterization of the asymptotic formula which is 
useful for proving bounds. 

Lemma 3.3. Under the assumptions and definitions of Theorem we have 

lim -/(X; G) = ^'( 7 *(A), A) = min 'I'( 7 ,A). (30) 

n^oo n 7e[0,oo) 


Proof. The function 7 1 —)■ ^( 7 , A) is differentiable on [0,oo) with '^( 7 , A) = 7 ^/( 4 A) + 0 ( 7 ) —>■ 00 
as 7 —?• 00 . Hence, the min..yg[o,oo) '^'( 7 ) is achieved at a point where the first derivative vanishes 
(or, eventually, at 0). Using the I-MMSE relation [GSVOSj . we get 


9T 

^7 


( 7 , A) 


1 _ 

2A 


1 1 

—I— mmse 
2 2 


( 7 ) • 


(31) 


Hence the minimizer is a solution of Eq. Q. As shown above, for A < 1, the only solution is 
7 *(A) = 0, which therefore yields '£'( 7 *(A), A) = min..j,g[o^oo) 'l'( 7 ) A) as claimed. 

For A > 1, Eq. ([^ admits the two solutions: 0 and 7 * (A) > 0. However, by expanding Eq. (31) 
for small 7 , we obtain 'h( 7 . A) = 'h(0. A) — (1 — \~^)fi^/A+o{^) and hence 7 = 0 is a local maximum, 
which implies the claim for A > 1 as well. □ 


We conclude by noting that Eq. ^ can be solved numerically rather efficiently. The simplest 
method consists is by iteration. Namely, we initialize 7 *^ = A and then iterate 7 *+^ = AG( 7 *). This 
approach was used for Figure 


3.3 Consequences for estimation 

Theorem |1.4| establishes that a phase transition takes place at A = 1 for the matrix minimum mean 


square error MMSE„(An) defined in Eq. (14). Throughout this section, we will omit the subscript 
n to denote the n —?• 00 limit (for instance, we write MMSE(A) = lim„_,.oo MMSE„(An)). 

Figurereports the asymptotic prediction for MMSE(A) stated in Theorem |1.4[ and evaluated 
as discussed above. The error decreases rapidly to 0 for A > 1. 

In this section we discuss two other estimation metrics. In both cases we define these metrics by 
optimizing a suitable risk over a class of estimators: it is understood that randomized estimators 
are admitted as well. 


• The first metric is the vector minimum mean square error: 

vmmse„(A„) = — inf e| min IIX — sS(G)||^]-. (32) 

n Ue{+i,-i} " 

Note the minimization over the sign s: this is necessary because the vertex labels can be 
estimated only up to an overall flip. Of course vmmse„(An) G [ 0 , 1 ], since it is always possible 
to achieve vector mean square error equal to one by returning x{G) = 0. 

• The second metric is the overlap: 

Overlap„(An) = - sup E{|(X, s(G))|} . (33) 
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Again Overlap„(An) G [0,1] (but now large overlap corresponds to good estimation). In¬ 
deed by returning Xi{G) G 1} uniformly at random, we obtain E{|(X, s(G))|}/n = 

0(ra-V2) ^ 0. 

Note that the main difference between overlap and vector minimum mean square error is that 
in the latter case we consider estimators x : Qn ^ M" taking arbitrary real values, while in 
the former we assume estimators s : On taking binary values. 


In order to clarify the relation between various metrics, we begin by proving the alternative 
characterization of the matrix minimum mean square error in Eqs. (18), (19). 


Lemma 3.4. Letting MMSE„(A) be defined as per Eg. (I 4 ), we have 

MMSE„(A) =E{[X2-E{A2|Ai = +l,G}]^|Xi = +1} 
= ^ min E{[A 2 -X 2 |i(G)]Vi = +l}. 


(34) 

(35) 


Proof. First note that Eq. (35) follows immediately from Eq. (34) since conditional expectation 
minimizes the mean square error (the conditioning only changes the prior on AC). 

In order to prove Eq. (34), we start from Eq. (16). Since the prior distribution on Xi is uniform, 
we have 


E{AiX2|G} = ^E{AiA2|ACi = +1, G} + ^E{AiA2|Ai = +1, G} 
= E{AC2|ACi = +l,G}. 


(36) 

(37) 


where in the second line we used the fact that, conditional to G, X is distributed as —X. Continuing 
from Eq. (16), we get 


MMSE„(A) = E{ [a:iAC 2 - E{AC2|ACi = +1, G}]^} (38) 

= ^E{[AiAC2-E{AC2|ACi = +l,G}]Vi = +1} 

+ ^ E{ [A 1 X 2 - E{X2|Xi = +1, G}]^\Xi = -1} (39) 

= E{[A2-E{A2|Ai = +l,G}]'|Ai = +l}, (40) 

which proves the claim. □ 


The next lemma clarifies the relationship between matrix and vector minimum mean square 
error. Its proof is deferred to Appendix |A.1[ 

Lemma 3.5. With the above definitions, we have 

1 — -^1 — (1 — n“i)MMSE„(A) < vmmse„(A) < MMSE„(A). (41) 

Finally, a lemma that relates overlap and vector minimum mean square error, whose proof can 
be found in Appendix |A.2[ 

Lemma 3.6. With the above definitions, we have 

Overlap„(A) > 1 — vmmse„(A) — . (42) 
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As an immediate corollary of these lemmas (together with Theorem 1.4 and Lemma 3.2), we 
obtain that A = 1 is the critical point for other estimation metrics as well. 


Corollary 3.7. The vector minimum mean square error and the overlap exhibit a phase transition 
at A = 1. Namely, under the assumptions of Theorem \l.l\ (in particular, Xn ^ X and np„(l——)■ 
oo), we have 

• // A < 1, then estimation cannot he performed asymptotically better than without any infor¬ 
mation: 


lim vmmse„(An) = 1, 

n^oo 

lim Overlap„(An) = 0. 

71^00 


(43) 

(44) 


If X> 1, then estimation can be performed better than without any information, even in the 
limit n ^ oo: 


0 < 1 — < liminf vmmse„(A„) < limsup vmmse„(A„) < 1 — < 1, (45) 

A n—>-03 A 

0 < < lim inf Overlap„(A„). (46) 

n^oo 


4 Proof strategy: Theorem |1.1| 

In this section we describe the main elements used in the proof of Theorem o 

• We describe a Gaussian observation model which has asymptotically the same mutual infor¬ 
mation as the SBM introduced above. 

• We state an asymptotic characterization of the mutual information of this Gaussian model. 

• We describe an approximate message passing (AMP) estimation algorithm that plays a key 
role in the last characterization. 


We then use these technical results (proved in later sections) to prove Theorem 1.1 in Section 4.3 
We recall that = {pn + qn)/‘2. Define the gap An = {pn - qn)/‘^ = We 

will assume for the proofs that limn->.oo An = A > 0 (i.e. the assortative model) but the results also 
hold for A < 0 in an analogous fashion. 


4.1 Gaussian model 

The edges are conditionally independent given the vertex labels X, with distribution: 






1 with probability p^ AnWAj , 

0 with probability 1 — Pn “ ^nXiXj . 


(47) 


As a first step, we compare the SBM with an alternate Gaussian observation model defined as 
follows. Let .Z be a Gaussian random symmetric matrix generated with independent entries Zij ~ 
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N(0,1) and Za ~ N(0,2), independent of X. Consider the noisy observations Y = Y{X) defined 
by 


Y(X) = J-XX'^ + Z . (48) 

V n 

Note that this model matches the first two moments of the original model. More precisely, if we 
define the rescaled adjacency matrix = {Gij —Pn)/VPni^ ~Vn)i then = 'E{Yij\X} 

and Var(GJf |X) = \a.v{Yij\X) + 

Our first proposition proves that the mutual information between the vertex labels X and the 
observations agrees to leading order across the two models. 

Proposition 4.1. Assume that, as n ^ oo, (i) — >• A and {ii) np„(l —Pn) oo. Then there is 

a constant G independent of n such that 


-\I{X-G)-I{X-Y)\<G 


A3/2 


n 


^nPn{l-Pn) 


+ |An — A| 


(49) 


The proof of this result is presented in Section 

The next step consists in analyzing the Gaussian model (48), which is of independent interest 


It turns out to be convenient to embed this in a more general model whereby, in addition to the 
observations Y, we are also given observations of X through a binary erasure channel with erasure 
probability s = 1 — e, BEC(e). We will denote by X(e) = (Xi(e),... ,X„(e)) the output of this 
channel, where we set Xi{e) = 0 every time the symbol is erased. Formally we have 


Xi{e) = B,Xi, 


(50) 


where Bi ~ Ber(e) are independent random variables, independent of X, G. In the special case 
e = 0, all of these observations are trivial, and we recover the original model. 

The reason for introducing the additional observations X (e) is the following. The graph G has 
the same distribution conditional on X or —X, hence it is impossible to recover the sign of X. As 
we will see, the extra observations X(e) allow to break this trivial symmetry and we will recover 
the required results by continuity in e as the extra information vanishes. 

Indeed, our next result establishes a single letter characterization of /(X; Y, X(e)) in terms of 
a recalibrated scalar observation problem. Namely, we define the following observation model for 
Xq ~ Uniform({+1, —1}) a Rademacher random variable: 


^0 = y/lYo + Zo, (51) 

Xo(e) = BqXo . (52) 

Here Xq, Bq ~ Ber(e), Zq ~ N(0,1), are mutually independent. We denote by mmse(7,e), the 
minimum mean squared error of estimating Xq from Xo(e), Yq, conditional on Bq. Recall the 
definitions Q, @ of 1(7), mmse(7), and the expressions ®, (HI. A simple calculation yields 

mmse(7,e) =E{(Xo-E{Xo|Xo(e),To})"} (53) 

= (1 — e) mmse(7). (54) 
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Proposition 4.2. For any A > 0, e G (0,1], let 7 *(A,e) be the largest non-negative solution of the 
equation: 


7 = A (l — (1 — e)mmse( 7 )) . (55) 

Further, define{'j, X, e) by: 

- I + e log2 + (1 -e) 1 ( 7 ). (56) 

Then, we have 

lim -I{X-,X{e),Y) = ^i>{^fiX,e),X,e). (57) 

n—>-oo n 

Using continuity in e, the last result implies directly a limit result for the mutual information 
under the Gaussian model, which we single out since it is of independent interest. 

Theorem 4.3. For any A > 0, let 7 * (A) be the largest non-negative solution of the equation: 

7 = A (1 — mmse( 7 )) . (58) 

Further, define{'j, X) by: 

^( 7 ,A) = ^ + ^ - I + 1 ( 7 ). (59) 

Then, we have 

lim -I(X;T) = ^( 7 *(A),A). (60) 

n^oo Tl 


4.2 Approximate Message Passing (AMP) 


To analyze the Gaussian model Eq. (48) we introduce an approximate message passing (AMP) 
algorithm that computes estimates x*' G M” at time t, which are functions of the observations 
Y,X(e). This construction follows the general scheme of AMP algorithms developed in |DMM09l 
IBMlHlJM13| . Given a sequence of functions ft :Rx {—1, 0, +1} —)> M, we set = 0 and compute 


^*+1 ^ X(e)), (61) 

\/n 

1 ” 

= (62) 

i=l 


Above (and in the sequel) we extend the function fi to vectors by applying it component-wise, i.e. 

ft{x\X{e)) = ift{x\,X{e)i),ftixl,X{e) 2 ),...ftixi,X{e)n)). 

The AMP iteration above proceeds analogously to the usual power iteration to compute princi¬ 
pal eigenvectors, but has an additional memory term —btft-i{x^~^)- This additional term changes 
the behavior of the iterates in an important way: unlike the usual power iteration, there is an 
explicit distributional characterization of the iterates x^ in the limit of large dimension. Namely, 
for each time t we will show that, approximately is a scaled version of the truth Xt observed 
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through Gaussian noise of a certain variance. We define the following two-parameters recursion, 
with initialization = ao = 0, which will be referred to as state evolution: 

-I-(TtZo,Xo(e))} , (63) 

+ UiZo,Xo(e))2} , (64) 

where expectation is with respect to the independent random variables Xq ~ Uniform({-|-1, —1}), 
Zq ~ N(0,1) and Bq ~ Ber(l — e), setting Xo(e) = BqXq. 

The following lemma makes this distributional characterization precise. It follows from the 
more general result of [JM13| and we provide a proof in Appendix [B| 

Lemma 4.4 (State Evolution). Let /* : M x {—1,0,1} —)• M 6 e a sequence of functions such that 
ft, ft are Lipschitz continuous in their first argument (where f[ denotes the derivative of ft with 
respect to the first argument). 

Let Ip :Rx {-|-, 1 — 1} X (-1-1,0, —1} —>■ M 6 e a test function such that \'ip{xi, s, r) — 'tp{x2, s, r)! < 

C{\ -|- |xi| -|- |x 2 |) |xi — X 2 I for all xi,X 2 ,s,r. Then the following limit holds almost surely for 

(Aq, Zq, Xo(e)) random variables distributed as above 

1 

lim -'^'il;{xl,Xi,X{e)i) =E{tp[gitXo + atZo,Xo,Xo{s))} , (65) 

n^oo n ^^ ^ ^ 

2 = 1 

Although the above holds for a relatively broad class of functions ft, we are interested in the 
AMP algorithm for specific functions ft- Specihcally, we following sequence of functions 


ft{y, s) = E{Xo\ij,tXo + atZo = y, Xo{e) = s} . 


( 66 ) 


It is easy to see that ft satisfy the requirement of Lemma |4.4[ We will refer to this version of AMP 
as Bayes-optimal AMP. 

Note that the definition ( 66 ) depends itself on yt and at dehned through Eqs. (63), (63). This 
recursive definition is perfectly well defined and yields 


ht+i — "^E{XQE{Xo\ij,tXo + atZo, X{s)o}], (67) 

al^i = E{E{Ao|^tAo -|- atZo, X(e)o}^} ■ (68) 

Using the fact that ft{y,s) = E{Ao|//tAo -|- atZ^ = y, Ao(e) = s} is the minimum mean square 
error estimator, we obtain 

yt+i = ^cr^+i , (69) 

CTt+i = 1 - (1 - e) mmse(Acj4^), (70) 

where mmse( •) is given explicitly by Eq. ([^. 

In other words, the state evolution recursion reduces to a simple one-dimensional recursion that 
we can write in terms of the variable 7 * = Xa^. We obtain 


7 f+i = A(l - (1 - e) mmse( 7 t)) , 
7t It 

A' 


a+ = — 


(71) 

(72) 
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Our proof strategy uses the AMP algorithm to construct estimates that bound from above the 
minimum error of estimating X from observations Y, X{e). However, in the limit of a large number 
of iterations, we show that the gap between this upper bound and the minimnm estimation error 
vanishes via an area theorem. 

More explicitly, we develop an upper bound on the matrix mean square error first introduced 
in Eq. Q. We generalize this in the obvious way to the Gaussian observation model: 

MMSE(A,e,n) = ^e|||XX^ - E{XX^|X(e), . (73) 


(Note that we adopt here a slightly different normalization with respect to Eq. (14). This change 
is immaterial in the large n limit.) 

We then use AMP to construct the sequence of estimators 2* = indexed by 

t € {1,2 ,..where ft-i is defined as in Eq. ( 66 ). The matrix mean squared error of this estimators 
will be denoted by 


MSEAMp(i;A,e,n) = ^e|||XA:'^ - x\xyf^ . 


We also define the limits 


(74) 


MSEAMp(i; A,e) = lim MSEAMp(t; A, e), (75) 

n^oo 

MSEAMp(A,e) = lim mseAMp(i; A, e). (76) 

t^OO 

In the course of the proof, we will also see that these limits are well-defined, using the state evolution 
Lemma 14.41 


4.3 Proof of Theorem 11.11 and Theorem 14.31 


The proof is almost immediate given Propositions 4.1 and 4.2, Firstly, 


-I{X-,X{e),Y)--I{X-,Y) 
n n 


1 


< -/(X;X(e),T) 


n 


Since, by Proposition 4.2 I{X;X{e),Y)/n has a well-defined limit 
arbitrary, we have that: 


note that, for any e G ( 0 , 1 ], 
< e log 2 (77) 

as n —>■ oo, and e > 0 is 


lim -I{X;Y) 

n—¥oo Ti 


liniT( 7 *(A,e), A,e). 
£->0 


(78) 


It is immediate to check that 'I'( 7 , A, e) is continuous in e > 0, 7 > 0 and ^'( 7 , A, 0) = ^( 7 , A) 
as defined in Theorem o Furthermore, as e —>■ 0, the unique positive solution 7 *(A, e) of Eq. ( [^ 
converges to 7 *(A), the largest non-negative solntion to of Eq. ([^, which we copy here for the 
readers’ convenience: 


7 = A(1 — mmse( 7 )). 


(79) 


This follows from the smoothness and concavity of the function 1 
follows that lim£_ 5 .o 4 '( 7 *(A, e). A, e) = 4 '( 7 *(A),A) and therefore 


mmse( 7 ) (see Lemma 6.1). It 


lim -/(X;T) = 4 /( 7 *(A),A), 

n^oo Ti 


This proves Theorem 4.3 Theorem 1 1.1 1 follows by applying Proposition 4.1 


( 80 ) 
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5 Proof of Proposition 4.1 


Given a collection V = {Vij)icj of random variables defined on the same probability space as X, 
and a non-negative real number A, we define the following Hamiltonian and log-partition function 
associated with it: 


A 


^X^ , A, u) — ^ ^ X^Xj^ XiXjX^Xj , 


i<j 


(/>(X, 1/, A, n) = log I ^ exp (?^(a;,X,y, A,n))| 
a;e{±l}" 


Lemma 5.1. We have the identity: 


I{X-Y) =nlog2 + 


(n — 1)A 


Proof. By definition: 


/(X;F) =E log 


Ex,z{cp{X,Zy^,X,n)} 
dpY\x{Y\X) 


dpY{Y{X)) /• 


(81) 

(82) 

(83) 

(84) 


Since the two distributions Py\x and pY are absolutely continuous with respect to each other, we 
can write the above simply in terms of the ratio of (Lebesgue) densities, and we obtain: 


IiX-,Y) =Elog' 


exp 


11^/4) 


Exe{±i}-2 ’"exp(-llr-yV 




(85) 


= n 


log2-Ex,zlog < ^ e^p[--'^{Zij + \^-{XiXj-XiXj)^ 


a:e{±l}" 


i<j 


2 1 


= n J 


a:e{±l}" 




A 


log2 - Ex,z log ^ ^ exp j -Zij{xiXj - XiXj) - - XiXj) 


We modify the final term as follows: 

^^{xiXj — XiXj)"^ = ^ 2 — 2xiXjXiXj 


i<j 


i<j 

= n{n — 1) — 2 XiXjXiXj. 
i<j 


( 86 ) 

1 

(87) 

( 88 ) 

(89) 


Substituting this in Eq. (87) we have 


I{X]Y) = nlog2 - Ex,zlog < ^ exp ^[^ZjjjxjXj - XiXj) -h ^xiXjXiXj'j i -F ^A(n - 1), 

[a:e{±l}" i<j * J 


as required. 


( 90 ) 

□ 
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Lemma 5.2. Define the (random) Hamiltonian T-Isbm{x, X,n) by: 


hisBM{x,X,G, n) 
Then we have that: 


Pn + ^nXiXj\ fl-p^-AnXiXj 


i<j 


Gij log - 


Pn T AfiXiXj 


+ (1 — Gij) log 


1- Pn AjiXiX j 


( 91 ) 


/(X;G) = nlog2 - Ex,Glog < ^ exp('HsBM(a3,G, n)) 

Proof. This follows directly from the definition of mutual information: 

rfY-n-F 

/(X,G)-Ex,g| dpG(G) r 

As in Lemma l5.11 we can write this in terms of densities as: 

n^<i{Pn + A„A,A,)Go(i 


dpG|x(G|X) ^ 

dPG(G) X)xe{±l}" ^ W.i<j{Pn + AnXiXj)^'‘:i (p^ + A^AjAj)^ 


Substituting this in the mutual information formula Eq. (93) yields the lemma. 
Define the random variables G = {Gij)i^j as follows: 

A, 


Gi, = 




Pn(l - Pn) 


^{Gij -Pn- AnXiXj). 


(92) 


(93) 

(94) 

□ 

(95) 


The following lemma shows that, to compute I{X] G) it suffices to compute the log-partition 
function with respect to the approximating Hamiltonian. 


Lemma 5.3. Assume that, as n ^ oo, (i) An X and (ii) np^{l — Pn) —>■ oo- Then, we have 

iX^G \ 


I{X-,G) =nlog2 + ^ - E 




ri/ 


(96) 


Proof. We concentrate on the log-partition function for the hamiltonian Hsbm(®) G,n). First, 
using the fact that log(c -|- dx) = ^ log((c -|- d){c — d)) + ^ log((c + d)/{c — d)) when x G {±1}: 


'Hsbm(®, G, n) 


{XiXj 


XiXj) 



/ l + An/Pn \ 
VI - ^nlPn) 


+ 


2 


log 


( 1 - A„/(l -p„) \\ 

VI + An/(1 -Pn)yy ’ 
(97) 


Now when max(Aji/p„, A„/(l — Pn)) < cq, for small enough cq, we have by Taylor expansion the 
following approximation for z G [ 0 , cq]: 


1 

2 


log 


1 + z 
1-z 


— z 


< 


(98) 
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which implies, by triangle inequality: 


TisBuix, X, G, n) = '^{xiXj - XiXj) ( j _g , 


Kj 


1 -Pr, 


where 


|err„| < CA^ 


3 , |(:r,Ga;)| + |(X,GX)| , |(*, (11^ - G)*)| + |(X, (11^ - G)X)| 


pI 


+ 


(1 -Pn)^ 

We first simplify the RHS in Eq. (|99[). Recalling the definition of Gij: 


^ XiXj 

i<j 


^nGij A„(l — Gij) 

Pn (1 - Pn) 


'y ^^ {xjXj XiXj) I Gij + _ 
i<j 

(n — 1)A 


AlXiX, 


Pni'^-Pn) 
+ T-L{x,X,G,Xn,n). 


This implies that: 


'HsBM{x,X,G,n) = +'R(a;,X,G, A„,n) +err„, 


(99) 


( 100 ) 


( 101 ) 

( 102 ) 

(103) 


where err„ satisfies Eq. (100). We now use the following remark, which is a simple application of 


Bernstein inequality (the proof is deferred to Appendix [B|) . 

Remark 5.4. There exists a constant C such that for every n large enough: 


sup I (a, Gx)\ > Gn^p^ \ < exp(-nV„/2)/2 , 
£ce{±i}" 1 


P 1 sup 

(s, (ll''' — G)x) 

[a:e{±l}" 



Using this Remark, the error bound Eq. (100) and Eq. (103) in Lemma 5.2 yields 

,2 


I{X-,G) =nlog2 + 


(n - 1)A^ 


Ex,g {^(a:, G, Xn,n)} + 0 (^A3 + 


n 


, Cl ( a3 /^^^exp(-n2p„) ^ n2exp(-n2(l-p„)) 

' pI 


= n log 2 + 


(n - 1)A,; 


-Ex,G\<i^{X,G,Xn,n)\+0{~. 


n' 


pI (1-Pn)^ 


!A3 

‘-^'n 


Pli'^-PnY 


Substituting A„ = (A,ip„(l — Pn)!'^)^^'^ gives the lemma. 


(104) 

(105) 


(106) 


(107) 

□ 


We now control the deviations that occur when replacing the variables Gij with Gaussian 
variables 
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Lemma 5.5. Assume that, as n ^ oo, (i) An and (ii) np^{l — p^) — >■ oo. Then we have: 


E. 


G, An, n)} = Ex,z Zy^, A,n)]+o(^ 


nPni'^-Pn) 


+ n I An — A| 


(108) 


Proof. This proof follows the Lindeberg strategy |Cha06[ iKMllj . We will show that: 

nA^/^ 


EU(X,G 


,An,n)|x} =E{(/.(X,zyA7n,A,n)|x} + 0 


(1 -Pn) 


+ n|An-A| . (109) 


(with the 0{- ■ ■) term uniform in X). The claim then follows by taking expectations on both sides. 
Note that, by construction: 


E{Gl\X} -E{Zf^iAn/n)\X} 



= 

1^} 

= 


+ ^nXiXj){l AnXiXj) 

Pn\^ Pn) 




A. 




n \\ np^{l-pj n 


and 


E{Gf^\X} 


A3 


Pn + AnXiXj){l -Pn- AnXiXjY 


PU^-Pn)^ 

+ (1-Pn- AnXiXYi-Pn ” AnX,X,)3) 


< 


< 


A3 


PU^-Pn)^ 

A3 


^^(Pn + AnXiXj)(l -Pn- AnX^Xj) 


+ 


A^ 


Pli'^-PnY Pli'^-Pn? Pli'^-Pn? 


+ 


A3 


< 


3A: 


3/2 


n{nPn{l - Pn)Y/‘^ ' 

We now derive the following estimates: 

\dlj(t){X,z,An,n)\<G, for r = 1,2, 3. 


( 110 ) 

( 111 ) 

( 112 ) 

(113) 

(114) 

(115) 

(116) 

(117) 


Here dlj(p{X , z, An,n) is the r-fold derivative of 4>{X , z, An,n) in the entry Zij of the matrix z. 
To write explicitly the derivatives dij(j){X, z, An,n) we introduce some notation. For a function 
/ : {±1}"' —)> M, we write {f)z to denote its expectation with respect to the measure defined by the 
hamiltonian T-L{x, X , z, An,n). Explicitly: 


(/)2 = e f{x)ex.p{n{x,X,z,A,n)) . 

X&{±1}" 


(118) 
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Then the partial derivatives above can be expressed as 


dij(f){X, z, A, n) = {xiXj - XiXj)z 

z, A, n) = {{{xiXj - XiXjf)z - {xiXj - XiXj}^) 
dfj(p{X, z, A, n) = {{{xiXj - XiXjf)z - 3{xiXj - XiXj)z{{xiXj - XiXjf), 
+ 2{xiXj - XiXj)l). 

However since \xiXj — XiXj\ < 2, we obtain: 

\d:jcP{X,z,Xn,n)\<C. 


Applying Theorem 2 of |KMllj (stated below as Theorem 5.6) gives: 

E{(/.(A:,G,A„,n)|x} =E{(()(X,Zv^,An,n)|x} 
Further, we have: 


«A?/^ 


Pn) 


\dxE{(l){X,Zy^),X,n)\x} I = i E!.{Y,XiXjXaj)\x\ 

” ^ i<j ^ 


n 

< 

- 2 


(119) 

( 120 ) 
( 121 ) 
( 122 ) 


(123) 

(124) 

(125) 

(126) 


Here d\ denotes the derivative with respect to the variable A. Thus, 


E 


{(/.(X, Z^A/n, A„, n)|x} = E {^{X, Z^/xJn, A, n)|x} + 0(n |A„ - A|). (127) 


Combining Eqs. (124), (127) gives Eq. (109), and the lemma follows by taking expectations on 
either side. □ 

We state below the Lindeberg generalization theorem for convenience: 

Theorem 5.6 (Theorem 2 in [KMllj l. Suppose we are given two colleetions of random vari¬ 
ables (t7i)jg[7v]; (bi)ie[Ar] with independent eomponents and a function f : —)> M. Let a* = 

\E{Ui} - E{Ei}| and h = \E{Uf} - E{E.2}|. Then: 


N 


\E{f{U)}-E{f{V)}\<Y,(a^^{\^if{Ul-\0,V^,)\} + ^^E{\^ffiUt\0,V/t^)\} 

i=l ^ 

+ ^e£' \dff{Ul-\s,V^,)\{Ui - sfds 


/o 

+ \dff{Ul-\s,VJl,)\{V, - sfds 


(128) 


With these in hand, we can now prove Proposition 4.1 


Proof of Proposition 4-1 The proposition follows simply by combining the formulae for I[X]G),I{X]Y) 
in Lemmas 5.1[ 5.3 with the approximation guarantee of Lemma 5.5 □ 
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6 Proof of Proposition 4.2 


Throughout this section we will write Y (A) whenever we want to emphasize the dependence of the 
law of Y on the signal to noise parameter A. 

The proof of Proposition 4.2 follows essentially from a few preliminary lemmas. 


6.1 Auxiliary lemmas 


We begin with some properties of the fixed point equation (55). 


found in Appendix B.2 


The proof of this lemma can be 


Lemma 6.1. For any e G [0,1]; the following properties hold for the function'j (1—mmse( 7 , e)) = 
(1 — (1 — e)mmse( 7 )): 


(a) It is continuous, monotone increasing and concave in 'j G M>o. 
(Jj) It satisfies the following limit behaviors 

1 — mmse( 0 , e) = e, 

lim [l — mmse( 7 , e)] = 1 . 
7^00 


(129) 

(130) 


As a consequence we have the following for all e G (0, !].■ 

(c) A non-negative solution 7 *(A,e) of Eq. exists and is unique for all e > 0. 

(d) For any e > 0, the function A i—)• 7 *(A,e) is differentiable in A. 

(e) Let { 7 t}t>o be defined recursively by Eq. ( f7l| ), with initialization 70 = 0. Then limt_,.oo 7 t = 
7 * (A, 7 . 

We then compute the value of 'I'( 7 *(A, e), A, e) at A = 00 and A = 0. 

Lemma 6.2. For any e > 0; 


lim T( 77 A,e), A,e) = elog 2 , 
A —^0 

lim T( 7 *(A,e),A, e) = log2. 

A^oo 


(131) 

(132) 


Proof. Recall the definition of mmse( 7 ), cf. Eq. ([^. Upper bounding mmse( 7 ) by the minimum 
error obtained by linear estimator yields, for any 7 > 0, 0 < mmse( 7 ) < 1/(1 + 7 ). Substituting 
these bounds in Eq. ( [5^ , we obtain 

(133) 

(134) 


max(0,7LB(A,e)) < 7*(A,e) < A, 


7lb(A, ~ 2 A ~ 1 + \/(A — 1)^ + 4Ae — A — (1 — e) + 0(A ^) 


where the last expansion holds as A —)■ oo. 

Let us now consider the limit A —)• 0, cf. claim ( |131 ). Considering Eq. (56), and using 0 < 
7 *(A,e) < A, we have (A/4) + ( 7 *(A, e)^/(4A)) — ( 7 / 2 ) = 0(A) —0. Further from the definition (|^ 
it follow^ that lim.y_>o 1 ( 7 ) = 0 thus yielding Eq. (131). 

^This follows either by general information theoretic arguments, or else using dominated convergence in Eq. |6|. 
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Consider next the A —>■ oo limit of Eq. (132). In this limit Eq. ( 133[ ) implies 7 *(A, e) = A + 5* —)• 
oo, where 5* = 5*(A,e) = 0(1). Hence lim^^oo K7*('^)£)) = lim..^^oo Ka) = log 2 (this follows again 
from the definition of 1 ( 7 )). Eurther 


A 7 ^ 7 * 6^ 

4+S-Y = 4X = 00 ■) 


Substituting in Eq. (56) we obtain the desired claim. 


(135) 

□ 


The next lemma characterizes the limiting matrix mean squared error of the AMP estimates. 


Lemma 6.3. Let {^t}t>o be defined recursively by Eq. (71) with initialization 70 = 0, and recall 
that 7 *(A,e) denotes the unique non-negative solution of Eq. (55). 


Then the following limits hold for the AMP mean square error 

7? 

MSEAMp(i; A,e) = hm MSEAMp(i; A, e, n) = 1 - ^ 

n^oo 

7^ 

MSEamp(A, e) = hm MSEAMp(t; A, e) = 1 - ^ . 

i—>-oo A 


(136) 

(137) 


Proof. Note that Eq. (|137 ) follows from Eq. (136) using Lemma 6.1 point (d). We will therefore 
focus on proving Eq. (136). 

Eirst notice that: 


MSE 


1 


AMP 




= E 


(t;A,e,n) = ^E<{ XX'-x\x*)'^ 


\X\ 


WxHf {X,x^) 


t\2 








(138) 

(139) 


Since ||X||n = n, the first term evaluates to 1. We use Lemma 


4.4 


to deal with the final two terms. 


Consider the last term (JC,2*)^/n^. Using Lemma 4.4 with the 'il>(x,s,r) = ft-i{x,r)s we have, 
almost surely 

1 " 

hm -y^ft-l{xl-\X{£)i)Xi = E{XoE{Xo\^it-lXo + at-lZo,X{e)o}} (140) 

n^oo n ^^ 

= ^ = X' 

Note also that \Xi\ and |T(| are bounded by 1, hence so is (X,£()/n. It follows from the bounded 
convergence theorem that 


lim E 

n^oo 


{X,x 


t\2 




Jl 

A2 • 


(142) 


In a similar manner, we have that lim^^oo IE {||jE*|||/n^} = y^/A^, whence the thesis follows. □ 
Lemma 6.4. Eor every A > 0 and e > 0." 


T( 7 *(A,e), A,e) =elog2 + ^ J MSEamp(A, e) dA . 


(143) 
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Proof. By differentiating Eq. (56) we obtain (recall = (l/2)mmse(7)): 


9'&(7,A,e) 


d'y 

5 ^( 7 , A, e) 


7 = 7 * 




dX 


1 


7 = 7 * 


7* 


• I 1 ^2 


(144) 

(145) 


It follows from the uniqueness and differentiability of 7 * (A, e) (cf. Lemma 6.1) that A i-)- 4 ^( 7 *(A, e), A, e) 
is differentiable for any fixed e > 0, with derivative 




1 


— (7*(A,e),A,e) = - 1 - ^ . 


dA 


7 ; 


A 2 


(146) 


The lemma follows from the fundamental theorem of calculus using Lemma 6.2 for A = 0, and 
Lemma 6.3 cf. Eq. (137). □ 


6.2 Proof of Proposition |4.2| 


We are now in a position to prove Proposition 4.2 We start from a simple remark, proved in 
Appendix 


Remark 6.5. We have 

|I(W; P(A), X(e)) - I{XX^-, W(A), W(e))| < log 2 . 
Eurther the asymptotic mutual information satisfies 

lim -/(XX^;X(e),W(0)) =elog2, 

n^oo Tl 

lim liminf-/(XX'^;X(e),P(A)) =log2. 

A^oo n^oo n 


(147) 

(148) 

(149) 


We defer the proof of these facts to Appendix B.3 


Applying the (conditional) I-MMSE identity of |GSV05j we have 
ldI{XX^;Y{X),X{e)) 1 


n 


dX 


2 n 2 


^E{(WAi -E{WXi|X(e),l"(A)})2} (150) 


i<j 


= -MMSE(A,e,n) 


< 




XX'^ -x\xy\\l^ 


4n2 

= -MSEAMp(t; X,£,n). 


(151) 

(152) 

(153) 
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We therefore have 


(l-e)log2^=^ lim liminf-r/(X;X(e),y(A))-/(X;X(e),l"(0))l (154) 

A^oo n->-oo n 


= lim liminf- f MMSE(A', e, n) dA' 

A—^oo n—>-oo 4 Jq 

(c) 1 

< lim limlimsup- / MSEAMp(i;dA' 

A^OO t^OO 4 JQ 

lim lim - [ MSEAMp(i; A', e) dA' 

A^oo t^oo 4 Jq 


(155) 

(156) 

(157) 


where (a) follows from Remark 6.5, (6) from Eq. (147) and (151), (c) from (153), and (cl) from 


bounded convergence. Continuing from the previous chain we get 

fA 


(•••) =\lim ^ / MSEAMP(A',£)dA' 
A^OO 4 Jq 


= lim [4'(7*(A,e), A,e) - 4'(7*(0,e),0,e)] 

A^oo 

= (1 - e) log 2 , 


(158) 

(159) 

(160) 


where (e) follows from Lemma 6.3, (/) from Lemma 6.4 and (g) from Lemma 6.2 


We therefore have a chain of equalities, whence the inequality (c) must hold with equality. Since 
MMSE(A,e, n) < MSEamp(L A, e, n) for any A, this implies 


MSEamp(A, e) = lim lim MSEamp(L A, e, n) = lim MMSE(A,e,n) 

t^oo n^oo n^oo 


(161) 


for almost every A. The conclusion follows for every A by the monotonicity of A i—MMSE(A, e, n), 
and the continuity of MSEamp(A, e). 

Using again Remark |6.5[ and the last display, we get that the following limit exists 


(162) 

(163) 


lim -I{X;X{e),Y{X)) = lim -I{XX'^; X{e),Y{X)) 
n^oo n n^oa fl 


1 


f-A 


<elog2+ lim 4 / MMSE(A', e, n) dA' 
n-^oo 4 Jq 

<elog2 + ^y MSEAMp(A',e)dA' 

= 4^(7*(A, e). A,e), 


(164) 

(165) 


where we used Lemma 6.4 in the last step. This concludes the proof. 
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Proof of Theorem 


1.4 


7.1 A general differentiation formula 

In this section we recall a general formula to compute the derivative of the conditional entropy 
H{X\G) with respect to noise parameters. The formula was proved in |MMRU04| and [MMRUO^ 
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Lemma 2], We restate it in the present context and present a self-contained proof for the reader’s 
convenience. 

We consider the following setting. For n an integer, denote by (^ 2 ^) the set of unordered pairs 
in [re] (in particular |(^ 2 ^)| = ( 2 )))- We will use e,ei,e 2 ,... to denote elements of (^ 2 ^)- For for 
each e = (i,j) we are given a one-parameter family of discrete noisy channels indexed by 0 G J 
(with J = (a, 6) a non-empty interval), with input alphabet {-|-1, — 1} and finite output alphabet 
y. Concretely, for any e, we have a transition probability 

{Pe,0(2/k)W{-Hl-l},yey > (166) 

which is differentiable in 0. We shall omit the subscript 9 since it will be clear from the context. 
We then consider X = (Xi,X 2 , ■ ■. ,Xn) a random vector in {-|-1, —I}”", and Y = (l^ij)^- 

a set of observations in 3 ^(^ 2 ') that are conditionally independent given X. Further Yij is the 
noisy observation of XiXj through the channel Pij{-\-)- In formulae, the joint probability density 
function of X and Y is 


Pxxix.y) =px{x) pij{y, 

(*d)6('^') 


ij \ XiXj 


(167) 


This obviously include the two-groups stochastic block model as a special case, if we take px{ ■) to 
be the uniform distribution over {-|-1, —1}*^, and output alphabet y = {0,1}. In that case Y = G 
is just the adjacency matrix of the graph. 

In the following we write Y^f, = (F'e')e'g(["l)\e observations excluded e, and 

Xe = XiXj ioi e = (iJ). 


Lemma 7.1 ( |MMRljn9] L With the above notation, we have: 


dH{X\Y) 

d0 


E E 


dpe{ye\Xe 

de 




(168) 


Proof. Fix e G (^ 2 ^)- By linearity of differentiation, it is sufficient to prove the claim when only 
Pe( • I ■) depends on 6. 

Writing H{X, Ye\Y-e) by chain rule in two alternative ways we get 


H{X\Y) + H{Ye\Y_e) = H{X\Y_,) + H{Ye\X, Y_e) 

= HiX\Y_e) + H(Ye\Xe) , 


(169) 

(170) 


where in the last identity we used the conditional independence of from X, Y-e, given Xf,. 
Differentiating with respect to 9, and using the fact that H(X\Y-e) is independent of Pe( • I ')) 
get 


dH{X\Y) _ dH{Ye\Xe) dH{Ye\Y_e) 

d9 d9 d9 


(171) 
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( 172 ) 


Consider the first term. Singling out the dependence of H{Ye\Xe) on pe we get 
dH{Ye\Xe) ^ - ^J 2 ^{pe{ye\Xe)logPe{ye\Xe)} 


d9 


ye 


E®{ 


ye 


dpe{ye\Xe 

de 


logPe(ye|Xe)} 


dpe{ye\Xe) . ^ , 

2 ^ -- PxA^e) iogPe{ye\Xe 


Xe.Ve 


de 


(173) 

(174) 


In the second line we used the fact that the distribution of X^, is independent of 0, and the normal¬ 
ization condition '^Pe{'^e\xe.) _ q 

We follow the same steps for the second term (171): 

dH{Ye\Y-e) 


de 


de 


'^^{['^PX,,Y,\Y.AXe,ye\Y-e) log^'^Px,,Y,\Y.AK^ye\Y-e) } (175) 


ye 


'd0 


'^^{['^Peiye\Xe)px,\Y_S^e\Y-e) log \^'^Peiye\x'e)px,\Y-e) } 

ye Xe x’ 


'^^^^^^^^^{pX,\Y.SXe\Y-e)\og\^YpMx'e)PX,\Y_Sx'e\y-e) }• 


^etUe. 


Taking the difference of Eq. (174) and Eq. (177) we obtain the desired formula. 

7.2 Application to the stochastic block model 


(176) 

(177) 

□ 


We next apply the general differentiation Lemma [7.1| to the stochastic block model. As mentioned 
above, this fits the framework in the previous section, by setting Y be the adjacency matrix of the 
graph G, and taking px to be the uniform distribution over {+1, —1}"'. For the sake of convenience, 
we will encode this as Y^ = 2Ge- In other words y = {+1,-1} and Y^, = +1 (respectively = — 1) 
encodes the fact that edge e is present (respectively, absent). We then have the following channel 
model for all e G (^ 2 ^): 

Pe{ + \+)=Pn, Pe{+\-) = qn, (178) 

Pe(-|+) = 1 -Pn , Pe{-\-) = I - qn ■ (179) 


We parametrize these probability kernels by a common parameter 9 G M>o by letting 


Pn=Pn + 


Pni^-Pn) 


e. 


n 


Qn =Pr 


Pn(l -Pn) 


n 


(180) 


We will be eventually interested in setting 9 = X^io make contact with the setting of Theorem |1.4[ 

Lemma 7.2. Let /(A; G) he the mutual information of the two-groups stochastic block models with 
parameters pn = Pn{9) and Qn = qn{9) given by Eq. (180). Then there exists a numerical constant 
C such that the following happens. 

For any ^max > 0 there exists no(0max) such that, if n> no(0max) then for all 9 G [0, 0max]; 


-^MMSE„(g) 


< C 


I 9 ^ 1 

nPni'^-Pn) 


(181) 
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Proof. We let Y^, = 2 G 2 — 1 and apply Lemma 7.1 Simple calculus yields 


dpe{ye\Xe) 1 


de 


nO 


XeVe- 


( 182 ) 


From Eq. (168), letting Xe{Y-e) = K{Xe\Ys} 
dHiX\Y) 1 /p„(l-pj 


de 


EE PX,\Y.S^e\Y-e)log \ J 2 Pe{ye\Xe)Px,\Y-e)] } 

eg(W)a:e,J/e x'^ 

” (183) 

Ex' Pe(+lke)PXe|r_e(4l^-e) 


Y ^eyePxAXe)logPeiye\Xe) 




Xe,ye 


E E los 

ee(W) 

1 , /Pn(l -Pn) „ rPe(+|+)Pe(-|-) 


iog{; 


Ex' Pe(-lK)PXe|r_eKI^-e) 

}■ 


4V V2y ''Pe(+|-)Pe(-| + ) 

Notice that, letting = Y/p„(l — p„)0/n, 

Ex^ Pe(+l|3;e)PXe|'K-e (^el^—e) {pn + Qn) + ijPn ~ Qn)Xe(Y—g) 


(184) 


Ex' 15e(-l|OPXe|r_e«l^-e) {2-pn- Qn) “ (Pn “ qn)Xe{Y-e) 

Pn 1 + (^n/Pn)®e(^-e) 


1 - Pn 1 - (^n/(l -Pn))^e(>^-e) 
Pe(+|+)Pe(-|-) ^ (1 + (A„/p„))(l + A„/(l - p^)) 
Pe(+|-)Pe(-|+) (1 - (A„/p„))(l - A„/(l - p^)) ■ 


(185) 

(186) 
(187) 


Since have |A„/p„|, |A„/(1 -p„)| < \/6'/[p„(l — p„)n] —>■ 0, and XeiY-e)\ < 1, we obtain the 
following bounds by Taylor expansion 


log 


Ex' Pe(+l|4)PXe|y-e«l^-e) 
Ex' Pe(-l|a;'e)pXe|y-e«l^-e) 


- Bn- 


o—— 


Pn(l - Pn) 


—,Xe{Y_,) 


log 


Pe(+|+)Pe(-|-) 


LPe(+|-)Pe(-|+)J p„(l-p„) 


2A, 


< c 

9 

(188) 

Pn(l -Pn)^ ’ 

< c 

9 

(189) 

Pn(l -Pn)^ ’ 


where Bq = log(p„/(l — Pn)) and C will denote a numerical constant that will change from line to 
line in the following. Such bounds hold for all 6 G [0, 0max] provided n > no(0max)- 


Substituting these bounds in Eq. (184) and using E{xe(l^-e)} = E{Ae} = 0, after some ma¬ 
nipulations, we get 


n — 


ee(N) 


< Cx 


e 


nPn{^-Pn) ' 


(190) 
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We now define (with a slight overloading of notation) Xe{Y) = ElXelY}, and relate XeiY-e) = 
K{Xe\Y-e} to the overall conditional expectation Xe(Y). By Bayes formula we have 


_ Pe(Xe\Ye)px,\Y_S^e\Y-e) 

) - Y.,,p^K\y,)px.,Y..«lY-,) ■ 
Rewriting this identity in terms of Xe{Y), Xe{Y-e), we obtain 

XeiY e) + b{Ye) 


MY) = 

b{Ye) = 


l + b{Ye)MY-e) ’ 

Pe{Ye\ + 1) +Pe{Ye\ “ 1) 


Pe{Ye\ + 1) -Pe{Ye\ “ 1) ' 
Using the definition of Pe{ye\xe), we obtain 

-Pn)^/inPn) if Ve = +1, 


b{ye) = 


-\/Pnd/iM-Pn)) if Ve = “I- 


(191) 

(192) 

(193) 

(194) 


This in particular implies \b{ye)\ < \/6/[nppj^{l — p„)]. From Eq. (192) we therefore get (recalling 
\Xe{Y-e)\ < 1) 


.|_ |6(Te)|(l-Xe(r-e)^) , , 

.(F) Xe(y-e)|- ^ - 


e 


nPniX-Pn) ' 


(195) 


Substituting this in Eq. (190), we get 




< c 


9 


nPni^-Pn) n 


V- . 


(196) 


Einally we rewrite the sum over e G (^ 2 ^) explicitly as sum over i < j and recall that = XiXj to 
get 


I dH{X\Y) ifnY^ 
n d0 ^ dU 


E{{XiXj-E{XiXj\Y})^} 

l< 2 <j'<n 


< c 


9 


nPni'^-Pn) n 


V- 1 . 


(197) 

Since Y is equivalent to Y (up to a change of variables) and I{X]G) = H{X) — H{G\G), 
with H{X) = nlog2 is independent of 0, this is equivalent to our claim (recall the definition 
of MMSE„(-), Eq. (@). □ 

7.3 Proof of Theorem 11.41 


Erom Lemma 7.2 and Theorem [Ml we obtain, for any 0 < Ai < A 2 , 

r‘A2 


n^oo 

From Lemma 16.31 and 16.41 


hm [ '^MMSE 40 )d 0 = T(7,(A2),A2)-4/(7,(Ai),Ai) 


lim [ " MMSE„(0) d9= / " f 1 - 1 d9. 

Jxi Jm V 


02 


(198) 


(199) 
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A Estimation metrics: proofs 


A.l Proof of Lemma 13.51 

Let us begin with the upper bound on vmmse„(A). By using x{G) = E{X| Ai 
we get 


+1, G} in Eq. (32), 


vmmse„(A) < — e| min IIX — sEiXlAi =+ 1 , ( 200 ) 

n t se{+i,-i} 

= -e| min IIX - sEiXlAi =+l,G}||^|Xi =+l]- ( 201 ) 

n t se{+i,-i} ^ J 

< iE|||A:-sE{X|Ai =+1,G}||2 |A:i = +i} (202) 

<e|||X-E{X|Xi = +l,G}|| 2 |Yi = +l}, ( 203 ) 


where the equality on the second line follows because (AC, G) is distributed as (—AC, G). The last 
inequality yields the desired upper bound vmmse„(A) < MMSE„(A). 

In order to prove the lower bound on vmmsen(A) assume, for the sake of simplicity, that the 
infimum in the definition (32) is achieved at a certain estimator x{-). If this is not the case, the 
argument below can be easily adapted by letting x{-) be an estimator that achieves error within 
e of the infimum. 

Under this assumption, we have, from (32), 


vmmse„(A) = E min -^ (AC,®(G)) + —||2(G)||2| 

se{+i,-i} In n J 

> Emin — — (AC, x{G)) + — ||®(G)|| 2 | 
asM In n ) 

(X,S(G))2' 


= 1 -E 


{ 


np(G)||( 




(204) 

(205) 

(206) 


where the last identity follows since the minimum over a is achieved at a = {X,x{G))/\\x{G)\\\. 

Consider next the matrix minimum mean square error. Let x[G) = (^(G))ie[n] ^m optimal 
estimator with respect to vmmse„(A), and dehne 


%,{G) = !5{G)%{G)xj{G), 


/3(G) = 


|S(G)||; 


-E 


/ (X,£(G))^ A 

V mG)\\i ) • 


(207) 
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Using Eq. (14) and the optimality of posterior mean, we obtain 


(1 -n-^)MMSE„(A) < 






i,j&[n] 


= IIXX' - 




{\\XX^ - P{G)x{G)x'^{G)\\l] 




2/3(G) 


/3(G)^ 


= E<11 - {X, x{G)y + 11® 


9 ' 9 


(G)||^} 


= 1 - E 


jXMG)) 

n\\x{G)\\l 


2\ 2 


( 208 ) 

(209) 

( 210 ) 
( 211 ) 


The desired lower bound in Eq. (41) follows by comparing Eqs. (206) and (211). 

A.2 Proof of Lemma 13.61 

We shall assume, for the sake of simplicity, that the infimum in the definition of vmmsen(A), see 
Eq. (32) is achieved for a given estimator x : On ^ E”. If this is not the case, the proof below is 
easily adapted by considering an approximately optimal estimator. We then define ^ : On ^ by 
letting 

_ x{G)y/n 

^ ’ " I|S(G)||2 ■ 

Notice that ||^(G )||2 = \/n. Also by the proof in previous section, see Eq. (206), we have 

1 

. 

and therefore (since |(A,^(G))| < n) 

1 


(A)>1-e{^(A,^(G))2}, 
I ^(A,^(G)) I > 1 - vmmse„(A). 


( 212 ) 

(213) 

(214) 
On —^ 


(215) 


{+1, —!}"■ defined by letting s{G) = (xj(G))jg[„] with 

- f+1 probability (1 + Ci{G))/2, 

1^ — 1 with probability (1 — ^i{G))/2. 

independently across i £ [n]. (Formally, s : x P —)• {+1, —1}” with Ol a probability space, but 

we prefer to avoid unnecessary technicalities.) 

We then have, by central limit theorem 

e|-|(X,s(G))| A,G| = -|(A,^(G))|+0(n-i/2)^ ^216) 

with the 0(n“^/^) uniform in X,G. This yields the desired lower bound since, by dominated 
convergence, 

OverlapJA)>E{i|(X,?(G))|} (217) 

>E{i|(X,^(G))|}-0(n-i/2^ (218) 

> 1 — vmmse„(A) — . (219) 
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B Additional technical proofs 

B.l Proof of Remark 15.41 

We prove the claim for (a;, Gx)] the other claim follows from an identical argument. Since 
we have by triangle inequality, that |E{(a;, Gx)\X}\ < 2n{n — l)p„. Applying Bernstein inequality 
to the sum {x, Gx) — E{(®, Gx)\X} = • XiXj{Gij — E{Gjj|A}) of random variables bounded 

by 1: 

P < sup {x^Gx)—¥,{{x,Gx)\X}) >t\ <2^ sup P{(®, G®) — E{(a;, Ga;)|X}) > t} 

J £ce{±l}" 

( 220 ) 

< 2 ^ exp(-tV2(n^Pn + i)) (221) 

Setting t = Cn?Pj^ for large enough C yields the required result. 

B.2 Proof of Lemma 16.11 

Let us start from point (a),. Since mmse(7,e) = e + (1 — e)(l — mmse(7)), it is sufficient to prove 
this claim for 


G( 7 ) = 1 — mmse( 7 ) = E{tanh (7 + ^Z)^} , (222) 

where, for the rest of the proof, we keep Z ~ N(0,1). We start by noting that, for all k G E>o, 

E |tanh (7 + = E |tanh (7 + , (223) 

This identity can be proved using the fact that "&{X\^X + Z} = tanh( 7 A + ^/yZ). Indeed this 
yields 

E |tanh (7 + VyZ)^^} = E |tanh( 7 X + VyZ)^^} (224) 

= E |e{X|^X + Z} tanh( 7 X + (225) 

= E |x tanh( 7 W + } (226) 

= E |tanh( 7 X + ^Z)2^-^| , (227) 

where the first and last equalities follow by symmetry. 

Differentiating with respect to 7 (which can be justified by dominated convergence): 

G'( 7 ) = E {(1 + Z/2^)sech(7 + ^/^Zf] (228) 

= E {sech (7 + ^Z)H + tt^E {Zsech (7 + .^Z)^} . (229) 

2 ^ 

Now applying Stein’s lemma (or Gaussian integration by parts): 

G'( 7 ) = E {sech (7 + i/yZ)^} — E { —tanh (7 + .^Z)sech (7 + ^/yZ)^} . (230) 
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Using the trigonometric identity sech( 2 :)^ = 1 — tanh( 2 :)^, the shorthand T = tanh (7 + and 

identity (223) above: 


G'(7) =E{l-T2-T + r3} (231) 

= E{i - 2r2+ r^} = E{(i - r2)2} ( 232 ) 

= E {sech (7 + ^/ 7 Z)^| . (233) 

Now, let ip{z) = sech^( 2 ;), whereby we have 

G'(7)=EWV7(V7 + ^))}- (234) 


Note now that tpiz) satisfies (z) V’(^) is even with zijj^z) < 0, (u) V’C-^) is continuously differ¬ 
entiable and (in) 'il^{z) and il}'{z) are bounded. Consider the function H{x,y) = E,{'i/j{x(Z -by))}, 
where x > 0. We have the identities: 


G'(7) = i/(V7,V7), 


c=y=Vi 


dH 

dy 


c=^/=^/7 


(235) 

(236) 


Hence, to prove that is concave on 7 > 0, it suffices to show that dH/dx, dH/dy are non¬ 

positive for X, y > 0. By properties (ii) and {iii) above we can differentiate H with respect to x, y 
and interchange differentiation and expectation. 

We first prove that dH/dx is non-positive: 


E{{Z + y)J;'{x{Z + y))} 

(237) 

roo 

/ <fiz)iz + y)'i/'ix{z + y))dz 

J —OO 

(238) 

poo 

/ ZLp{z — y)^)'{xz)dz. 

J —OO 

(239) 


Here ^p{z) is the Gaussian density ^p{z) = exp(— 2 :^/ 2 )/\/^. Since zi/^z) < 0 by property (i) and 
y:i{z — y) > 0 we have the required claim. 

Computing the derivative with respect to y yields 


dH 

dy 


xE {'!/)'(x(Z -b y))} 

/ OO 

X'4)'{x{z + y))ip{z)dz 

-OO 

/ OO 

x'lp' {xz)ip{z — y)dz 

-OO 

1 

- / x'lp'{xz)ip{z — y)dz 
^ J — OO 



xi/>' {xz)ip{y + z)dzy 


(240) 

(241) 

(242) 

(243) 


where the last line follows from the fact that 'tp'{u) is odd and ^p{u) is even in u. Consequently 

dH 


— =x il;\xz){ip{y - z) - (p{y + z))dz. 
dy Jo 


(244) 


Since (p{y — z) > (p{y + z) and ip'{xz) < 0 for y, 2 ; > 0, the integrand is negative and we obtain the 
desired result. 
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B.3 Proof of Remark 16.51 


For any random variable R we have 

H{X\R) - H{XX^\R) = H{X\XX^, R) - H{XX^\X, R) = H{X\XX^, R). (245) 


Since H{X\XX'^, R) < H{X\XX'^) < log 2 (given XX'^ there are exactly 2 possible choices for 
X), this implies 


0 < H{X\R) - H{XX'^\R) < log2. 


(246) 


The claim (147) follows by applying the last inequality once to = 0 and once to R = (X(e), Y (A)) 
and taking the difference. 

The claim (148) follows from the fact that Y (0) is independent of X, and hence /(X; X(e), ’P(O)) = 
/(X;X(e)) = nelog2. 

For the second claim, we prove that limsup^^go 77(X|P(A), X(e)) < 5(A) where 5(A) — )■ 0 
as A —?• oo, whence the claim follows since H{X) = nlog2. We claim that we can construct an 
estimator x{Y) G {—1,1}*^ and a function 5i(A) with limA^.oo '^i(-^) = 0, such that, defining 


then we have 


E{X, n) 


1 if min {d{x{Y), X), d(S(X), -X)) > n5i(A), 
0 otherwise. 


(247) 


lim ^{^^(A, n) = l| = 0 . (248) 

To prove this claim, it is sufficient to consider x{Y) = sgn(^;i(X)) where vi{Y) is the principal 
eigenvector of Y. Then |CDMF(M IBGNTT] implies that, for A > 1, almost surely, 

lim ^|(t,i(X),X)| = Vl-A-i. (249) 

Hence the above claim holds, for instance, with 5i(A) = 2/A. 

Then expanding H{X, E\Y(X), X(e)) with the chain rule (whereby E = E(X,n)), we get: 

H{X\Y{X),X{e)) + H{E\X, X(A), X(e)) = i7(X, i?|X(A), X(e)) (250) 

= H{E\Y{X),X{e)) + H(X\E,Y{X),X{e)). 


Since E is a function of X, Y (A), H{E\X,Y{X)) = 0. Furthermore H{E\Y{X), X{s)) < log 2 since 
E is binary. Hence: 


H(X|X(A),X(e)) <log2 + i7(X|^,X(A),X(e)) (251) 

= log2 + F{E = 0}H{X\E = 0, X(A), X(e)) + F{E = 1}H{X\E = 1, X(A), X(e)). 

When E = 0, X differs from ±.x{Y) in at most 5in positions, whence H{X\E = 0, X(A), X(e)) < 
n5ilog(e/5i) + log2. When FI = 1, we trivially have H{X\E = l,X(A),X(e)) < H{X) = n. 
Consequently: 

H'(X|X(A),X(e)) < 21og2 + n5ilog^ +n5i. (252) 

5i 

The second claim then follows by dividing with n and letting re —)• oo on the right hand side. 
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B.4 Proof of Lemma 14.41 


The lemma results by reducing the AMP algorithm Eq. (61) to the setting of |JM13| . 
By definition, we have: 


yn 

= ^(X, ft{x\ X{e)))X + Zft{x\ X{e)) - btft-i{x^-\X{e)). 
n 

Define a related sequence G M" as follows: 

= Zft{s^ + fitX, X{e)) - + fit-iX, X{e)) 


bi — — ftisj + X(e)j), 

2G[n] 

= a;° + ^qX . 

Here is defined via the state evolution recursion: 


— \/AE {Xo/i(^iXo + (TtZo, Xo(e))} , 
a? = E {ft{^ltXo + atZo,Xo{e)f} ,. 

We call a function ■!/; : —)> M is pseudo-Lipschitz if, for all ri, u G 

IV’(w) — '>P{v)\ < L(1 + ||ri|| + ||u||) ||ri — u|| 


(253) 

(254) 

(255) 

(256) 

(257) 

(258) 

(259) 

(260) 


where L is a constant. In the rest of the proof, we will use L to denote a constant that may depend 
on t and but not on n, and can change from line to line. 

We are now ready to prove Lemma 4.4 Since the iteration for s* is in the form of |.TM13j . we 
have for any pseudo-Lipschitz function ip: 


1 " 

hm -TiP{siXi,Xie)i)=E{^iatZo,Xo,Xo{e) = q)} 

n^oo Tl 


2=1 


Letting ?/)(s, z, r) = ipi^s -|- z, r), this implies that, almost surely: 

1 ” 

lim - ^ V’(4 + fJ,tXi,Xi,X{e)i) = E{ijj{fj,tXo + atZg, Xq, Xo(£))} . 

n—¥oo Tl 

2=1 

It then suffices to show that, for any pseudo-Lipschitz function ip, almost surely: 

1 ” 

lim -'y'[ip{sl + i^tXi,X{e)i)-ip{xl,Xi,X{e)i)] =0. 

n^oo Tl 


2=1 


(261) 


(262) 


(263) 


We instead prove the following claims that include the above. For any t fixed, almost surely: 


lim — 

n^oo Tl 


n 


[iPisj + ^^tXi, X{e)i) - iPixi Xi, X(e)0] = 0, 

2=1 

(264) 

lim - A* 2 = 0) 

(265) 

n^oo Tl 

limsup — s* -|- /itX 2 < oo , 

(266) 


n—^oo 'll 
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where we let A* = a;* — s* — /ifX. 

We can prove this claim by induction on t. The base case of t = —1,0 is trivial for all three 
claims: + yPX = and = 0 is satisfied by our initial condition = 0, = 0. Now, 

assuming the claim holds for £ = 0,1,... t — 1 we prove the claim for t. 

By the pseudo-Lipschitz property and triangle inequality, we have, for some L\ 

- ,l;{s\ + ^ltXi,Xi,X{e)i)\ < L |A*| (1 + \s\ + iitXi\ + |x*|) (267) 

< 2L(|A*| + 14 + IA*| + IA(l'). (268) 

Consequently: 


n 


^ [i:{x\Xi,X{e)i) - V>(s* + ^itXi, Xi, X{e)i)] 


2=1 


L 

< — 
n 


n 

(|^i| + I'Si + fJ-tXi\ I A(| + I A-| ^ 


2 = 1 


(269) 


— ^11 ^*112 + \/^ll^*l|2 + II II2 II+ fJ-tX\\2j . 

(270) 


Hence the induction claim Eq. (264) at t follows from claims Eq. (265) and Eq. (266) at t. 

We next consider the claim Eq. (265). Expanding the iterations for x^, s* we obtain the following 
expression for Ab 


A = 


( \X{e)),X) _ ^ \ ^ 1 ft_,{x^-\x{e)) - + fit-iX, X{e))) 

y n J y/n 


— ^X{e)i)+ ht-ift-2{s\ + Ht-2Xi^X{e)i). 

Here Zi is the row of Z. 

Now, with the standard inequality {zi + Z 2 + z^Y < 3(zi + + z^}: 


(271) 


1 


n 


Vx 


-||A‘||^ < L —{XJt-i{x^-\X{E)))-f,t 


n 

+ ^||Z||2 \\f,_,{x^-\Xie)) - ft_,{s^-^ + f,t-iX,X{e))\f 


+ L 


bt-i - ht-i 


2 1 


n 


Y,ft-2{s^^^ + ^^t-2Xi,X{e)i) 


2 = 1 


+ 7. |bi_i|2 i + ^it-2Xu X{e)i) - ft-2{xY\ X{e)i)f. (272) 


2 = 1 


Using the fact that ft-i, ft -2 are Lipschitz: 


1 


n 


Vx, 


-||A‘||^ < L( —{X,ft_,{x^-\X{e))) -fit] + ^ll^ll^ A 


n 


L 


|2 II Ai-l ||2 




+ L 


bt-i - bt_i 


2 1 


n 


Y,ft-2{sl-^ + fit-2Xi,X{8)i) 


2 = 1 


+ L |bt_i 


,2 


n 


(273) 
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By the induction hypothesis, (specifically y, r) = yft-i{x) at t — 1, wherein it is immediate to 
check that yft-i{x) is pseudo-Lipschitz by the boundedness of yt,crt)- 


n—>-oo n n—>-oo n 


Thus the first term in Eq. (273) vanishes. For the second term to vanish, using the induction 
hypothesis for it suffices that almost surely: 


limsup — \\Z\\^ < oo. 

n^oo ^ 


(275) 


This follows from standard eigenvalue bounds for Wigner random matrices |AGZn9] . For the third 
term in Eq. (273) to vanish, we have by |JM13j that: 

1 " 

limsup + yt- 2 Xi,X{e)i) < oo. 

n—^■oo ^ 


(276) 


2=1 


Hence it suffices that limr. 


bf-i — bt_i = 0 a.s., for which we expand their definitions to get: 


lim bt_i-bt_i= lim -V[//_i(4 ^ + yt-iXi, X{e)i) - fl_-^{xl \A(e)i)]. 

n^oo Ti >- 


(277) 


2=1 


By assumption, is Lipschitz and we can apply the induction hypothesis with 'ip{x,y,q) = 
fi-i{x, q) to obtain that the limit vanishes. Indeed, by a similar argument ht-i is bounded asymp¬ 
totically in n, and so is bt_i. Along with the induction hypothesis for this implies that the 

fourth term in Eq. (273) asymptotically vanishes. This establishes the induction claim Eq. (265). 

Now we only need to show the induction claim Eq. (266). However, this is a direct result of 
Theorem 1 of |JM13j : indeed in Eq. (261) we let ip{x,y) = (s(-|-^tAj)^ to obtain the required claim. 
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